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Abstract 

Cj i Adaptive subtraction is a key element in predictive multiple-suppression methods. It min- 

O ; imizes misalignments and amplitude differences between modeled and actual multiples, and 

thus reduces multiple contamination in the dataset after subtraction. Due to the high cross- 
^ ■ correlation between their waveform, the main challenge resides in attenuating multiples without 

(3JQ[ distorting primaries. As they overlap on a wide frequency range, we split this wide-band problem 

into a set of more tractable narrow-band filter designs, using a ID complex wavelet frame. This 
decomposition enables a single-pass adaptive subtraction via complex, single-sample (unary) 
K ■ Wiener filters, consistently estimated on overlapping windows in a complex wavelet transformed 

domain. Each unary filter compensates amplitude differences within its frequency support, and 
can correct small and large misalignment errors through phase and integer delay corrections. 
This approach greatly simplifies the matching filter estimation and, despite its simplicity, nar- 
rows the gap between ID and standard adaptive 2D methods on field data. 

1 Introduction 

Multiples correspond to unwanted coherent events related to wave field reflection bounces on given 
surfaces. We refer to the comprehensive survey in Verschuur [46] for a detailed description. Their 
attenuation ^45] represents one of the greatest challenges in past and present seismic processing, since 
the January 1948 issue of Geophysics. Their processing mainly relies on two types of approaches: 
(1) primary/multiple discrimination based on different velocities and reflections and (2) periodicity 
and predictability. 

The first approach maps data to a transformed domain with minimum overlap between primaries 
and multiples. Transformed domain filters subsequently attenuate multiples or select primaries of 
interest. The filtered data is finally mapped back to data domain, using an appropriate inverse 
transform. Common tools include stacking combined with Normal Move Out (NMO), homomorphic 
filtering ^5], f — k and r — p representations (29| . or alternative breeds — parabolic, hyperbolic 
— of the Radon transform [IE, 41, 28|, with pros &: cons discussion in Kabir and Verschuur (2ll |. 

The second approach includes prediction filters and variations thereof (ssl, [H, [13] , also termed 
identification, updating, shaping, sequential or matching filters in electrical engineering literature 



l33l | or predictive deconvolution \4^. Recently, modeling based techniques, such as surface-related 
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multiple estimatio n ( SRME), have demonstrated excellent performance, allowing data-driven mul- 
tiple removal 45, 50, [23| or model-driven, wave-equation based, multiple removal jsH, EH- These 
methods, based on multiple models, consist in predicting then subtracting multiple events from 
original seismic data. 

To compensate for mismatches, an adaptive subtraction is usually performed : however, 

more direct subtraction, for instance when formulated as an inverse problem, is possible [see, e.g., 
H, The adapted model is commonly obtained through a linear Wiener-type filter [sl] that 

minimizes the energy of the difference between data and model. Although primaries and multiples, 
generated from the same source, are not fully uncorrelated, or orthogonal, least square error (LSE) 
solutions are widespread, mainly due to their computational simplicity. To reduce the distortion 
of primaries due to their cross-correlation with multiples, these matching filters are usually applied 
in two steps in practical cases. Long filters compensate global amplitude, waveform and time-shift 
differences, then shorter filters correct time variant discrepancies. 

The observation on primaries and multiples cross-correlation leads to more hybrid methods, 
mixing both transform and prediction approaches. Improvements may reside in different strategies. 
Either the use of more robust objective matching criteria than LSE based ones, such as those based 
on the Li norm [14], or independence-based subtraction (22| are efficient alternatives. Simultaneous 
estimation of sets of filters based on a broader quantity of information is another option, e.g., in 
multichannel adaptation jil]. A suitable data representation may further serve adaptive subtraction 
(Taner [38] with radial suppression, Berkhout and Verschuur [J] with focal transform). A recent 
trend focuses on multiscale or wavelet-like approaches [l9|, which may better promote sparsity 
in exploiting slight differences in primaries and multiple spectra. Pokrovskaia and Wombell |32l |. 
Ahmed jl] use standard discrete wavelet transforms. Herrmann and Verschuur yj|], Donno et al. 
jsl, Neelamani et al. (26| follow a twist towards curvelets, sometimes regarded as a local multiscale 
Radon transform 

In many cases, the choice of the type and length of the matching filter, whatever the domain, 
needs careful parameter selection and testing, and requires computational resources. With our 
method, we aim at overcoming this issue by decomposing a complicated wide-band problem into 
a set of more tractable narrow-band problems, through a wavelet transform. In the proposed 
approach, model-based multiple removal is solely based on ID traces, and hence does not compete 
directly with higher dimensional [l^l , regularized [lil, EH or robust approaches [3| • More precisely, 
comparing with exis ting literature, we retain the followin g th ree concepts: 1) non-stationary Wiener 
matching filters (23, Elf, 2) complex trace processing (25, 49, 18|, 3) continuous wavelet frames (36| . 

The adopted complex Morlet wavelet frame emulates limited-scale complex derivatives (25|. The 
adaptation in the wavelet domain is p erformed at each scale with a complex "unary" filter, e.g., a 
one-coefficient matching operator (30|], accounting for localized phase and amplitude variations be- 
tween data and model. The flexible redundancy and the "complex trace" effect of this wavelet frame 
allow a better management of time variability in model misalignment errors, at the price of a less 
intuitive interpretation of the designed subtraction operator, due to the complex interplay between 
unary filters and wavelet decomposition. The complex curvelet method, proposed by Neelamani 
et al. (26| . can be viewed as performing unary Wiener filtering in a complex- valued curvelet domain. 
With the proposed algorithm, we aim at a better understanding of the intricate relationship between 
a sparsifying transform and its associated matched filtering. Its flexibility both resides in choices 
for a quasi-analytic wavelet and its scale discretization, to better adapt variations in data character 
and sampling. The algorithm alleviates some constraints of the curvelets (either real or complex) 
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such as a relatively high memory footprint and its fixed dyadic frequency decomposition. However, 
as strictly ID, it does not capitalize the information given by neighboring traces. Despite this, 
the closed-form of the matching filter may be beneficial in keeping low complexity, computational 
speed and possible parallelization. Being nevertheless rather competitive to 2D techniques, it may 
fit sparsely or non-uniformly sampled datasets. 

2 Methodology 

A classical ID trace observation model is: 

d[n] — p[n] + m[n] + w[n] , (1) 

where d[n]^ m[n] and w[n] denote the recorded data, primary events, multiples and background 
noise, respectively, at discrete time index n. 

2.1 Complex wavelet transform decomposition 

We perform a time-scale decomposition of each data d[n] and multiple model x[n] traces with discrete 
wavelet frame approximations to continuous wavelet transform (CWT). We refer to Jorgensen and 
Song (20I and references therein for comparison between discrete and continuous wavelet transforms. 
We choose the complex Morlet wavelet since it yields a simple interpretation of amplitude and phase 
delay in the transformed domain. 

The mother Morlet wavelet is approximately analytic and writes: 

^(t) = ^-V4e-^^otg-t2/2 ^ (2) 

where cjq is the central frequency of the modulated Gaussian, t denoting the continuous time variable. 
A standard choice of 7r^2/ ln(2) for the central frequency halves side-lobe amplitude with respect 
to the main lobes. 

The associated discrete family of functions is defined as a sampling of the mother wavelet: 

rrM-^^J '^^ZT" ) ' (3) 

with T the sampling rate and V the number of voices per octave. Indices r,j ^ Z and v G 
[0, . . . , y — 1] denote, respectively, discretized time, octave, and voice. Finally, 60 stands for the 
sampling period at scale zero, j = = 0. Its redundancy, of approximately 2V/bQ^ may be adapted 
to the expected computational efficiency. 

The time-scale representation of trace d[n]^ for instance, is given by the inner product: 

d = = <d[n],CjN> = Y.d[n]i^], (4) 

n 

and written in bold, with indices dropped for more generality, accounting for the whole trace, or 
a windowed portion of it. Windowing in time-scale domain improves the localization in complex 
unary filters estimation. Expression ip'^j[n] denotes the complex conjugate of ip^jl^]- 
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Suppose d results from some time-scale correction or processing of d, here model matching and 
subtraction. Then the resulting filtered trace is synthesized back to the time domain as a sum of 
the dual frame components, 

'^M^EE^i^^jN, (5) 

r j> 

weighted by the corrected signal rf^^-, for instance here with multiples subtracted. Illustrations of 
corresponding time-scale raw and processed data are provided in j43|. With sufficient redundancy, 
^l)[n] can be approximated by up to a constant multiplicative factor [see, e.g.. |48|. 

The discriminative power of the wavelet frame helps reformulate the design of a long matching 
filter into combined global and local optimum complex unary filters, minimizing the error between 
multiple events and their matched model. The straightforward group-delay estimation allowed by 
Morlet wavelets, together with frame redundancy, helps avoiding reconstruction artifacts observed 
with non-redundant discrete wavelet processing |12| . 



2.2 Unary filter estimation: ampfitude and delay 

When the group delay difference between model and multiple sequences is less than half a period at 
all scales, it can be corrected in the time-scale domain following an LSE approach. The optimum 
unary filter at a given wavelet scale, either in local or global windowed portions of the trace, is 
defined as the complex scalar a which, multiplied by the time-scale decomposed multiple model x, 
minimizes the mean square error ^(a) with the decomposed input dataset d: 

<^opt — argmin^(a) = argmin ||d — ax||^ (6) 

a a 

where the phase of aopt compensates small delays. 

This method is efficient, provided the above small group delay condition is satisfied (igj. Other- 
wise, i.e., when the cross-correlation peak departs from the zero delay sample, faulty aopt estimation 
in equation [6] requires a redesign of the ffiter. 

Two options can be conceived to add robustness against high fractional delays, while keeping 
with the unary filter approach: either to design filters with a high group delay, or to add a gross group 
delay correction first and then to apply the previous unary ffiter for fine group delay correction. In 
contrast with the Fourier transform, wavelets' fast decay strongly limits the maximum group delay 
correction on the ffist option. Hence, we focus exclusively on the second one. However, high group 
delay filters enable a relatively simple scale-wise unary filter design, that could reinforce matching 
in noisy scenarios, in applications where the specifications of group-delay range are smaller than the 
effective time support of the chosen wavelets. 

We thus introduce a delay term into the above unary filter to yield higher delay robustness, 
while keeping an LSE criterion. This amounts to finding optimum a and delay / that minimize: 

^(a,/) ^^\(f.[r]-a]x)[r-l]\^ = ||d -ax^f , (7) 

r 

where x/ denotes a delay of x by / samples, sequences (ij[r] and x^-[r] are complex and the ^ symbol 
denotes a locally weighted sum along odd L consecutive samples around index r. We opt to drop 
all subscripts for aopt foi* clarity. It is understood that this algorithm may be applied either locally 
or globally, depending on problem constraints. 
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For a given delay /, optimum aopt yields orthogonality between filtered signal y = d — aoptX; 
and adapted model ax^, which writes in inner product form as: 



(d - ttoptx^, ax/) =0 Va 7^ . 



(8) 



By developing and applying linearity on the first argument and conjugate linearity on the second 
one, we obtain: 

a((d,x^) - ttopt (x^x^)) = Va^O (9) 
and therefore 



Oopt [I] 



2 ' 



(10) 



i.e., the unary Wiener filter solution for complex signals. 

Note that instead of performing a direct measurement of the fractional delay, or equivalently 
of the group delay, we estimate the phase delay at each scale or frequency component. When the 
signal-to-noise ratio (S/N) is high, all meaningful phase components are sufficiently well estimated. 
However, when the S/N decreases, moderate phase errors made at key scale components may induce 
huge errors on group delay. As a consequence, we have to ensure that the phase of aopt varies 
smoothly in scale for important components. 

Several criteria, well adapted to the nature of the seismic signals, can be chosen in the selection 
of the optimum delay. The optimum integer delay, in LSE sense, is selected among all sub-optimal 
<^opt[^] as the one minimizing ^(a,/): 



/opt = argmin^(aopt[/]) 
I 



(11) 



Alternatively, due to the importance of waveform over amplitude, we can define the optimum delay 
as the one that maximizes the normalized cross-correlation between the adapted model and the 
data, commonly called coherence 27, 39. lisl]. 



^opt = argmax C[l] 
I 



being 



C[l] = Re 



(d,aopt [/]x/) 



(12) 



(13) 



|d|| ||aopt[/]x/| 

where we focus on the real part of the normalized cross-correlation. 

The use of unary filters enables two important simplifications of the previous equations. Apply- 
ing conjugate linearity in the second argument of the inner product and positive scalability of the 
norm gives: 

^ (d,x;) 



^opt — arg max Re 



Q^opt [I] 

|aopt[/]| ||d|| ||x/| 



(14) 



Namely, the normalized cross-correlation between the adapted model and the data only depends 
on the phase of the optimum unary filter and the normalized cross-correlation between the original 
model and the data. 

Slighty better results can be obtained with an independent fine selection of the window functions 
used in the estimation of the unary filter and the normalized cross-correlation, equations [TOl and [T3l 
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Figure 1: Subtraction results on a synthetic dataset with two events (a) with multiple model (b), 
using: (c) standard ID adaptation, (d) ID complex unary filters in time-scale, and (e) standard 2D 
adaptation. 



However, even more important to us, when these weighted sums are chosen equal, the normalized 
cross-correlation of the corrected sequence is equivalent to the cosine of the angle between d and 
X/. This can be shown noticing that aopt[/]/ |ttopt[^]| can be written depending on d and x/ from its 
definition in equation \T0\ 

^optW ^ (d,x^) 

|aopt[/]| |(d,x,)r ^ > 

As a consequence, the normalized cross-correlation is totally independent of the optimum unary 
filter in this particular case, 

/opt = arg max -r— — - (16) 
I \W\ W^lW 

and the coherence value measured is always real and positive. When no window is used, this criterion 
reduces to maximum cross-correlation between data and predicted multiples. 

This configuration enables an important reduction on the computational cost when the normal- 
ized cross-correlation estimator is used, as it enables a direct estimation of the optimum integer 
delay, without having to estimate the optimum unary filter for each candidate. Concisely, we first 
gauge an integer delay, equation [161 to rectify large time shifts; we then estimate the optimum 
unary filter, equation [T0\ to shrink amplitude and remaining delay differences. 

3 Examples 

Figure ^ represents synthetic data with two intersecting dipping events with wavelets of equal 
amplitude and frequency content; Figure [TJ) shows the multiple model. As expected, ID algorithms 
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a) Receiver number b) Receiver number 
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Figure 2: Portion of common shot gather 1551. (a) Recorded data, (b) Multiple model. 



(ID filters computed in ID data windows) behave similarly and fail to properly cope with the inter- 
section. However, a dataset showing single primary and multiple events with identical wavelet shape 
is not completely realistic. In practice, variations in wavelet shapes associated with uncorrelated 
time arrivals of primaries and multiple greatly contribute to the success of adaptation in a given 
time frame. Thus, better behavior of the proposed algorithm on actual data is expected. Firstly, it 
enhances primaries and multiples discrepancies in the wavelet domain. Secondly, it enables cross- 
correlation reduction using larger window frames containing several signals, as demonstrated in the 
following field example. 

The examples shown in Figures [2ti and[3ti are taken from a 3D real marine dataset, in common 
shot and channel gather, respectively. The main objective is to uncover primaries masked by strong 
multiple events using a multiple model. The 3D-SRME model [31] shown in Figures [2)3 and [3b 
achieves good precision on the near and mid offset range. We compare the proposed method based 
on ID complex unary filters in wavelet domain, with reference algorithms based on standard ID 
and 2D adaptive filters. Figures [4] and [5] represent subtraction results in common shot and channel 
gathers, respectively. Additional details are provided in Figures [6] (for improved visual amplitude 
assessment) and [7] (for accurate waveform comparison), to better illustrate the main features in data 
and differences between algorithms. 

With a standard two-step method, the global step mainly contains waveform and time-shift 
corrections, combined with a global amplitude adaptation factor. In 2D, it is obtained with large 
2D windows and long 2D filters. The global step is followed by a local adaptation, to account 
for local amplitude and phase discrepancies (in 2D, with small 2D windows and short ID filters). 
Reference 2D algorithm parameters are taken from the best result obtained in an independent 
benchmark. Despite not being used in practice, the ID adaptive filter is provided for completeness 
only. 

With the proposed one-step method, a Morlet wavelet frame (with parameters ujq = 6.4 rad/s, 
j G [1,4], G [0,3] and bo = 1) is used to decompose the data and reconstruct the filtered signal. 
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a) Shot number 
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Figure 3: Portion of first channel gather, (a) Recorded data, (b) Multiple model. 
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a) Receiver number b) Receiver number c) Receiver number d) Receiver number 
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Figure 4: Subtraction results on common shot gather 1551. Filtered data (a) with the standard ID 
adaptive filter, (b) with ID complex unary filters in the time-scale domain, and (c) with standard 
2D adaptive filter in the time-space domain, (d) Difference between (b) and (c). P and M mark 
discrepancies on primary and multiple events, respectively. 

In the wavelet transformed domain, the optimum unary filter estimation for each sample uses a 
rectangular window of 0.636 s. Additionally, to provide robustness against delays higher than half 
the signal period (at each scale), we have chosen the maximum normalized cross-correlation criterion 
within a range of ±12 ms. This subtraction operator may compensate for additional distortions as 
well, e.g., wavelet squaring introduced in convolution-based multiple models, since the estimation 
of the inverse wavelet is implicit in the estimation of an LSE filter. 

Being single-pass, wavelet-based unary Wiener filters achieve slightly better non-coherent noise 
level (e.g., in Figure [6j). As pointed out in ellipses in Figure [5l standard ID lacks in terms of primary 
preservation and multiple attenuation. While the standard 2D approach exhibits best performance 
to this respect, as shown in the zoom of Figure O the proposed method exhibits close performance, 
despite not using additional information conveyed by neighboring traces. Figure El compares the 
waveforms of data, model, adapted and filtered results, for traces between indices 1470 and 1520 
from Figure [6l We appreciate how the set of unary filters successfully reduces the differences between 
multiple model and actual multiple events, to attain sufficient multiple event attenuation to uncover 
main primary events. The trace overlay indicate the waveforms are well preserved. 

To complete the picture, we provide additional results on near offset stacks. Figures [8] to [10] 
compare raw data and the different adaptive multiple subtraction methods. Note that stacking re- 
moves a large portion of the multiples. The performance is thus evaluated on finer details, especially 
in the difference between raw data and filtered sections, as shown in Figure [THl The proposed ID 
wavelet-based complex unary filter appears less aggressive with respect to the primaries previously 
masked by the multiples than the global adaptation step of a standard ID method (after the local 
step, over-adaptation was even higher), compare the pointed signals in Figures [TOk andfTOb. How- 
ever, as previously inferred, the proposed method leaves a little more remnant multiples than the 
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Figure 5: Subtraction results on the first channel gather, (a) Standard ID adaptive filter, (b) ID 
complex unary filters in the time-scale domain, (c) Standard 2D adaptive filter in the time-space 
domain. See Figure [6] for a zoom on the delimited zone. 
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Figure 6: Zoom on traces between shot number 1400 and 1700 of Figure [5l (a) Recorded data, (b) 
Original model. Filtered data with (c) standard ID adaptive filters, (d) ID complex unary filters 
in the time-scale domain, and (e) standard 2D adaptive filters. P marks discrepancies on primary 
events. 



11 




Figure 7: Superimposition of selected waveforms, (a) Recorded data and original model with a small 
positive delay, (b) Recorded data and adapted model with ID complex unary filters in time-scale, 
(c) Recorded and filtered data. 



a) Shot number 
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Figure 8: Near offset stack of the raw data. P and M mark primary and multiple events, respectively. 
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a) Shot number 
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Figure 9: Near offset stacks of filtered data using (a) standard ID adaptive filters, (b) ID compl 
unary filters in time-scale, and (c) standard 2D adaptive filters. 
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Shot number 
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Figure 10: Near offset stacks of the difference between the raw data and the filtered data using 
(a) standard ID adaptive filters, (b) ID complex unary filters in time-scale, and (c) standard 2D 
adaptive filters. P marks overadaptation to primary events. 
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standard 2D method, as it can be appreciated by comparing figures around shot number 1200 and 
above. 

The continuous wavelet frame provides an inherent scale- adaptive windowing that, when com- 
bined with the ID complex unary filter, offers obvious improvements over standard ID LSE time- 
varying frequency filters, and compares promisingly, at least away from isolated event crossings, 
with the industry standard 2D method on the common-channel gather, where a perfect removal of 
the source effects can still be demanding. Additional constraints may be added in the scale and in 
the space dimensions to improve the fractional delay estimation and reinforce the lateral coherence 
of the method, either with adaptive (qI, 13] or low-redundancy multiscale transforms [6], which are 
robust to correlated noises and possess finer bandwidth selectivity than dyadic scales. 

4 Conclusion 

We propose a model-based multiple subtraction which combines complex Morlet wavelet frame with 
complex unary Wiener filters. This subtraction method counter-balances the redundancy provided 
by this complex wavelet frame with the freedom of designing fast filters which allow for simple 
non-stationary model adaptation. Complex unary filters with a low varying phase along scale can 
adapt fractional delays up to half the signal period at a given frequency. To add robustness against 
higher delays, we integrate an integer delay term in the kernel of the unary filters, which can be 
estimated globally or locally, depending on the application. Despite not competing with more 
advanced multiple removal methods, either regularized or in higher dimension spaces, the results of 
ID unary filters in time-scale compare promisingly with standard 2D LSE methods. The structure 
of the algorithm allows for reduced memory footprint, higher code parallelization, and is potentially 
suitable to sparse acquisition. Further developments foresee additional diversity borrowed from 
neighboring traces, to improve apparent velocity separation and residual noise attenuation. The 
presented multiple removal technique aims at updating the portfolio of hybrid demultiple algorithms. 
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